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1 External memory alg orithms and data structures: dealin g with massive dat a M 

Jeffrey Scott Vitter 



June 2001 ACM Computing Surveys (CSUR), volume 33 issue 2 
Publisher: ACM Press 

Additional Information: full citation, abstract, references, citings, index 
terms 



Full text available: ^ pdf(828.46 KB) 



Data sets in large applications are often too massive to fit completely inside the computers 
internal memory. The resulting input/output communication (or I/O) between fast internal 
memory and slower external memory (such as disks) can be a major performance 
bottleneck. In this article we survey the state of the art in the design and analysis of 
external memory (or EM) algorithms and data structures, where the goal is to exploit 
locality in order to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external 
memory, hierarchical memory, multidimensional access methods, multilevel memory, 
online, out-of-core, secondary storage, sorting 



Scalable parallel a lloc ati on: Sc a lab l e lo cality- conscious mul tithread ed memo ry 
allocation 

Scott Schneider, Christos D. Antonopoulos, Dimitrios S. Nikolopoulos 
June 2006 Proceedings of the 2006 international symposium on Memory 

management ISMM '06 
Publisher: ACM Press 

Full text available: pdf( 267.12 KB ) Additional Information: full citation, abstra ct, references, index terms 

We present Streamflow, a new multithreaded memory manager designed for low 
overhead, high-performance memory allocation while transparently favoring locality. 
Streamflow enables low over-head simultaneous allocation by multiple threads and adapts 
to sequential allocation at speeds comparable to that of custom sequential allocators. It 
favors the transparent exploitation of temporal and spatial object access locality, and 
reduces allocator-induced cache conflicts and false sharing, all using a un ... 

Keywords: memory management, multithreading, non-blocking, shared memory, 
synchronization-free 
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The Alpine file system 
M. R. Brown, K. N. Kolling, E. A. Taft 

November 1985 ACM Transactions on Computer Systems (TOCS), volume 3 issue 4 
Publisher: ACM Press 

Full text available* 1 5pdf(2.95 MB ) Additional Information: full citatio n, abstract , reference s, citin gs, index 

terms , review 

Alpine is a file system that supports atomic transactions and is designed to operate as a 
service on a computer network. Alpine's primary purpose is to store files that represent 
databases. An important secondary goal is to store ordinary files representing documents, 
program modules, and the like. Unlike other file servers described in the literature, Alpine 
uses a log-based technique to implement atomic file update. Another unusual aspect of 
Alpine is that it performs all commu ... 

4 Cache coherence in large-scale shared-memory multiprocessors: issues and 

com pari sons 
^ David J. Lilja 

September 1993 ACM Computing Surveys (CSUR), volume 25 issue 3 
Publisher: ACM Press 

Full text available: ^pdf( 3.12 MB ) Additional Information: full citat ion, references, .citings, index ter ms 



5 Sy stem-level p owe r o ptimization: techniques and tools 
&i Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 5 Issue 2 
Publisher: ACM Press 

Full text available: pdf(385 22 KB) Ac^dit ' ona, Information: full citation, abstr ac t, references, citings, index 

terms 

This tutorial surveys design methods for energy-efficient system-level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the 
three major constituents of hardware that consume energy, namely computation, 
communication, and storage units, and we review methods of reducing their energy 
consumption. We also study models for analyzing the energy cost of software, and 
methods for energy-efficient software design and compilation. This survery ... 

6 T race-driv en memory simulation: a surve y 
Richard A. Uhlig, Trevor N. Mudge 

June 1997 ACM Computing Surveys (CSUR), Volume 29 Issue 2 
Publisher: ACM Press 

Full text available- pdf(636 1 1 KB) Add ' t ' onal Information: full citation , abstract , references , citings, index 
[£j terms , review 

As the gap between processor and memory speeds continues to widen, methods for 
evaluating memory system designs before they are implemented in hardware are 
becoming increasingly important. One such method, trace-driven memory simulation, has 
been the subject of intense interest among researchers and has, as a result, enjoyed rapid 
development and substantial improvements during the past decade. This article surveys 
and analyzes these developments by establishing criteria for evaluating trac ... 

Keywords: TLBs, caches, memory management, memory simulation, trace-driven 
simulation 
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Loren P. Meissner j 
December 1989 ACM SIGPLAN Fortran Forum, volume 8 issue 4 1 
Publisher: ACM Press 

Full text available: ^T| pdf(21.36 MB) Additional Information: full citation , abstract , index terms 

Standard Programming Language Fortran. This standard specifies the form and 
establishes the interpretation of programs expressed In the Fortran language. It consists of 
the specification of the language Fortran. No subsets are specified in this standard. The 
previous standard, commonly known as "FORTRAN 77", is entirely contained within this 
standard, known as "Fortran 8x". Therefore, any standard-conforming FORTRAN 77 
program is standard conforming under this standard. New features can b ... 

8 Disk cache— miss ratio a n alysis and desi g n considerations | 
^ Alan j. Smith 

August 1985 ACM Transactions on Computer Systems (TOCS), volume 3 issue 3 
Publisher: ACM Press 

Full text available- fi?|pdf(3.13 MB) Additional Information: full citation, abstract, references, citings, index 

terms, review 

The current trend of computer system technology is toward CPUs with rapidly increasing 
processing power and toward disk drives of rapidly increasing density, but with disk 
performance increasing very slowly if at all. The implication of these trends is that at some 
point the processing power of computer systems will be limited by the throughput of the 
input/output (I/O) system. A solution to this problem, which is described and evaluated in 
this paper, is disk cache 

9 Fi le se rver s for network-based distributed s ystems [ 
Liba Svobodova 

^ December 1984 ACM Computing Surveys (CSUR), volume 16 issue 4 
Publisher: ACM Press 

Full text available- 1f3 pdf(4 23 MB) Additional Information: full cit atio n, references, citings, index term s, 
^ review 



10 Dataflow machine architecture 
Arthur H. Veen 

December 1986 ACM Computing Surveys (CSUR), volume 18 issue 4 
Publisher: ACM Press 

Full text available* f|!| pdf(3 19 MB) Additional Information: ful l citation , abstract, references, cit in gs, index 
Iaj terms 

Dataflow machines are programmable computers of which the hardware is optimized for 
fine-grain data-driven parallel computation. The principles and complications of data- 
driven execution are explained, as well as the advantages and costs of fine-grain 
parallelism. A general model for a dataflow machine is presented and the major design 
options are discussed. Most dataflow machines described in the literature are surveyed on 
the basis of this model and its associated technology. F ... 

11 HFS: a performance-oriented flexible file system based on buildin g-block 
<^ compositions 

^ Orran Krieger, Michael Stumm 

August 1997 ACM Transactions on Computer Systems (TOCS), volume is issue 3 
Publisher: ACM Press 

Full text available- ffl pdf(383 87 KB) Ad d' tiona l Information: full citation , abstract , references , citings, index 

terms , review 

The Hurricane File System (HFS) is designed for (potentially large-scale) shared-memory 
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multiprocessors. Its architecture is based on the principle that, in order to maximize 
performance for applications with diverse requirements, a file system must support a wide 
variety of file structures, file system policies, and I/O interfaces. Files in HFS are 
implemented using simple building blocks composed in potentially complex ways. This 
approach yields great flexibility, allowing an application ... 

Keywords: customization, data partitioning, data replication, flexibility, parallel 
computing, parallel file system 



12 Characterizin g the cachin g and synchronization performance of a multiprocess or 
(§> op eratin g s ystem 

^ Josep Torrellas, Anoop Gupta, John Hennessy 

September 1992 ACM SIGPLAN Notices , Proceedings of the fifth international 

conference on Architectural support for programming languages and 

operating systems ASPLOS-V, volume 27 issue 9 
Publisher: ACM Press 

Full text available: ^|pdf( 1,52 M B) Additional Information: full citation , references , citings, index terms 



13 Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full cita tion, abstract , references, index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the execution 
of the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not 
provide the user with the desired overview of the application. In our experience, such tools 
display repeated occurrences of non-trivial commun ... 

14 Specifying Java thread semantics usin g a uniform memory model I 
Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom 

November 2002 Proceedings of the 2002 joint ACM-ISCOPE conference on Java 
Grande 

Publisher: ACM Press 

Full text available: ^ pdf(202.03 KB) Additional Information: full citation , abstract , references , index terms 

Standardized language level support for threads is one of the most important features of 
Java. However, defining the Java Memory Model (JMM) has turned out to be a major 
challenge. Several models produced to date are not as easily comprehensible and 
comparable as first thought. Given the growing interest in multithreaded Java 
programming, it is essential to have a sound framework that would allow formal 
specification and reasoning about the JMM. This paper presents the Uniform Memory Model 
(UMM), ... 

Keywords: Java, compilation, memory models, threads, verification 
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Chun Xia, Josep Torrellas 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 
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annual international symposium on Computer architecture ISCA '96, volume 
24 Issue 2 
Publisher: ACM Press 

Full text available- f£| pdf d 65 MB ) Additional Information: full citation , abstract , references , citing s, index 

terms 

High-performing on-chip instruction caches are crucial to keep fast processors busy. 
Unfortunately, while on-chip caches are usually successful at intercepting instruction 
fetches in loop-intensive engineering codes, they are less able to do so in large systems 
codes. To improve the performance of the latter codes, the compiler can be used to lay out 
the code in memory for reduced cache conflicts. Interestingly, such an operation leaves 
the code in a state that can be exploited by a new type of ... 

S ynchronization models: Composable memory transactions 
Tim Harris, Simon Marlow, Simon Peyton-Jones, Maurice Herlihy 
June 2005 Proceedings of the tenth ACM SIGPLAN symposium on Principles and 

practice of parallel programming 
Publisher: ACM Press 

Full text available: ^ pdf( 239.37 KB ) Additional Information: full citation , abstra ct, r eferences , index terms 

Writing concurrent programs is notoriously difficult, and is of increasing practical 
importance. A particular source of concern is that even correctly-implemented concurrency 
abstractions cannot be composed together to form larger abstractions. In this paper we 
present a new concurrency model, based on transactional memory, that offers far richer 
composition. All the usual benefits of transactional memory are present (e.g. freedom 
from deadlock), but in addition we describe new modular fo ... 

Keywords: locks, non-blocking algorithms, transactions 




1 7 Memo r y access schedulin g 

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, John D. Owens 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th 

annual international symposium on Computer architecture ISCA '00, volume 

28 Issue 2 
Publisher: ACM Press 

Full text available' Wi pdf(181 84 KB) Adcl ' tlonal Information: full citation , abstract , references , citings, index 
l^j _ terms 

The bandwidth and latency of a memory system are strongly dependent on the manner in 
which accesses interact with the "3-D" structure of banks, rows, and columns 
characteristic of contemporary DRAM chips. There is nearly an order of magnitude 
difference in bandwidth between successive references to different columns within a row 
and different rows within a bank. This paper introduces memory access scheduling, a 
technique that improves the performance of ... 

18 Parallelizing nonnumerical code with selective schedulin g an d software pipelining 
Soo-Mook Moon, Kemal Ebcioglu 

November 1997 ACM Transactions on Programming Languages and Systems 

(TOPLAS), Volume 19 Issue 6 
Publisher: ACM Press 

Full text available* f Q pdf(543.93 KB) Additional Information: full citation , abstract , references , citin gs, index 

terms 

Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to 
exploit due to its irregularity. In this article, we introduce a new code-scheduling technique 
for irregular ILP called "selective scheduling" which can be used as a component for 
superscalar and VUW compilers. Selective scheduling can compute a wide set of 
independent operations across all execution paths based on renaming and forward- 
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Keywords: VLIW, global instruction scheduling, instruction-level parallelism, software 
pipelining, speculative code motion, superscalar 

19 Functional-join processing ggj 
R. Braumandl, J. Claussen, A. Kemper, D. Kossmann 

February 2000 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 8 Issue 3-4 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(486.22 KB) Additional Information: full citation, abstract , citings, index terms 

Inter-object references are one of the key concepts of object-relational and object- 
oriented database systems. In this work, we investigate alternative techniques to 
implement inter-object references and make the best use of them in query processing, 
i.e., in evaluating functional joins. We will give a comprehensive overview and 
performance evaluation of all known techniques for simple (single-valued) as well as 
multi-valued functional joins. Furthermore, we will describe special order- preser ... 

Keywords: Functional join, Logical OID, Object identifier, Order-preserving join, Physical 
OID, Pointer join, Query processing 

20 Energy-aware design of embedd ed memori e s: A s u rv ey of technologies. jj| 
<H> architectures , and o ptimization techniq ues 

^ Luca Benini, Alberto Macii, Massimo Poncino 

February 2003 ACM Transactions on Embedded Computing Systems (TECS), volume 2 

Issue 1 
Publisher: ACM Press 

Full text available: "g pdf(288.44 KB) Additional Information: full citation , abstract , references , index terms 

Embedded systems are often designed under stringent energy consumption budgets, to 
limit heat generation and battery size. Since memory systems consume a significant 
amount of energy to store and to forward data, it is then imperative to balance power 
consumption and performance in memory system design. Contemporary system design 
focuses on the trade-off between performance and energy consumption in processing and 
storage units, as well as in their interconnections. Although memory design is as ... 
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